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SDSS, LSST AND GAIA: LESSONS AND SYNERGIES 

Mario Juric^ and Zeljko Ivezic^ 



Abstract. The advent of deep, wide, accurate, digital photometric sur- 
veys exempUfied by the Sloan Digital Sky Survey (SDSS) has had a 
profound impact on studies of the Milky Way. In the past decade, 
we have transitioned from a scarcity to an (over)abundance of precise, 
well calibrated, observations of stars over a large fraction of the Galaxy. 
The avalanche of data will continue throughout this decade, culminat- 
ing with Gaia and LSST. This new reality will necessitate changes in 
methodology, habits, and expectations both on the side of the large sur- 
vey projects as well as the astrophysics community at large. We argue, 
based on the experience with SDSS, that surveys should release data 
as early and often as possible incorporating incremental improvements 
in each subsequent release, as opposed to holding off for a single, big, 
final release. The scientific community will need to reciprocate by per- 
forming analyses and (re-analyses) appropriate to the current fidelity 
of the released data, understanding that these are continually evolving 
and improving products. 
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1 Introduction 

In the past decade, the data from the Sloan Digital Sky Survey (SDSS) have dram- 
matically enhanced our pictur e of the Milky Way . This includes comprehensive 
Galactic str ucture studie s (e.g. Carollo et al. . 2007l : IJuric et al.l . l2008t ITvezic et al ' 



2008a; Bon d et al.l. 20 id), discoveries of oyerden sities and tidal streams (e.g. Newberg et al 
200i iBelokurov et al.l . l2006l : lJuric et all . l2008l ) as well as discoveries of a signif- 



icant population of u ltra faint Milky Way satellites (e.g. Willnian et al. . 20051 



Belokurov et al.l . 120071 and others) . An example from lJuric et al.l (|2008l ) given in 



Figure [T] shows a detection of two low-contrast disk overdensities in stellar number 
density maps constructed from SDSS observations of ~ 48 million stars. 

The SDSS is a digital photometric and spectroscopic survey. It has covered 
a contiguous region of about 7,600 deg^ centered on the North Galactic cap, a 
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Fig. 1. Two low-contrast disk overdensities detected in SDSS DR3 stellar number density 
map residuals. The left panel shows the Data/Model — 1 residuals in the {R,Z) plane, 
while the center and left panels show the X — Y cross sections intersecting the detected 
overdensities. See lJuric et al.l (|2008l ) for details. 



smaller but deeper area in the Galactic South 225 deg^), and approximately 
3, 200deg^ of imaging for the Sloan Extension for Galactic Understanding and Ex- 
ploration (SEGUE). More generally, it is an example of the kind of deep, wide 
and accurate surveys that will be increasin gly common in thi s decade. A non- 



exhaustive list includes Pan-STARRS PSl (jKaiser et al.l . 120021 ) that has already 



surpassed the SDSS in terms of covered area in less than 6 months after the start 
of its science mission, SkyMapper (Keller et al. 2005), currently in commission- 
ing, the Large Sky Area Multi-Object Fibre Spec troscopic Telescope (LA MOST; 



Su et al.l . 119981). the Dark Energy Survey fD ES: iFlaugher et all. 120051) and o f 



course Gaia (|Perrvman et al.l . l200ll ) and LSST (|Tvsonl l2002i llvezic et all l2008bf ) 
All these, including Gaia, share the common thread of aiming to produce and 
pubhsh large (> 10* objects) and wide area datasets, a challenge the SDSS has 
already faced and successfully tackled. 



2 Publishing and Consuming Large Datasets: Early, Often, Iterate 

The lessons learned from SDSS are many. Here we limit ourselves to only two 
specific areas of immediate interest to Gaia and its users: deciding when and what 
to publish, and appropriately approaching the published data. 

2.1 Release Early, Release Often 

The SDSS has published the first public release of its data (the "Early Data 
Release"; EDR) in June 2001 fet oughton et al.U2002l ). It consisted of roughly 460 
deg^ of imaging data, 54, 000 follow-up spectra (about 5% of the planned totals) 
and a photometric and astrometric catalog of ^14 million objects. 

By today's SDSS Collaboration standards, the data included in this release 
would be considered substandard, even embarrassing. To illustrate this point. 
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at the time of the EDR even the true filter bandpasses were not known, the 
photometric calibration was at 3-5% level (as opposed to current ~ 1 — 2%), and 
the output formats and tables in the data distribution system were far from being 
finalized. 

In spite of these deficiencies, the EDR was a major success (and never criticized 
as premature). First, it demonstrated that the survey was collecting and process- 
ing data (an important milestone for a project 10 years in the making) . Secondly, 
the release generated invaluable feedback from the community which was incor- 
porated into, and significantly strenghtened the subsequent data releases. Most 
importantly, the data described in the EDR produced substantial scientific returns, 
includin g (of importanc e to G alactic astronomy) the discovery of the Monoceros 



Stream (jNewberg et al.l . 120021 1 . At the time of writing of this contribution, the 



EDR paper has been cited 1215 times. 

The SDSS has continued with releases on a 12-18 month schedule, with the lat- 
est (8"^) data release planned for December 2010. Besides adding more area, every 
new release involves reprocessing of all previously published data to correct prob- 
lems identified in the older releases, as well as to benefit from major improvements 
in the processing soft ware (for example, the inclus ion of an improved photometric 



calibration algorithm: iPadmanabhan et al.l . 120081 ) . In SDSS experience, many of 



the problems reported by users were very subtle and practically the only way to 
discover them was to perform "cutting-edge" science analysis and then critically 

examine the results. 

The SDSS is far from al one in this approach. T he RAVE survey ( Zwitter et al.l . 

2008h . as weU as UKIDSS (jlawrence et all . l2007t ). follow the same strategy. The 



planned Large Synoptic Survey Telescope (LSST) will follow a similar yearly re- 
lease schedule. This "release earl y, release often" approach is also well known in 
open source software development ( Ravmondl . ll997[ ). The early data releases result 
in better communication and feedback from the wider astrophysical community, 
improve the quality of the published datasets, and reduce the overall "time to 
science" . 

We feel the same model is applicable to Gaia. In addition to helping the Gaia 
project team discover and correct problems as they appear, early releases will 
surely generate a significant amount of follow-up science, enable synergies with 
ground based surveys such as LSST (see Section [3|), as well as benefit the project 
in terms of education and public outreach. 



2.2 Understanding that Datasets Evolve 

The paradigm described above puts an important onus of responsibility on the 
users to take into account the evolving nature of the datasets. As will be noted 
elsewhere in this volume (see the contribution by Hogg), the only "final" product a 
survey can ever deliver are the raw images. Everything else, including the catalogs, 
is a derived product and subject to change as the understanding of the instrument, 
the processing algorithms, or the underlying assumptions (priors) evolve. To the 
best of our knowledge, no survey ever had all of these requirements well known 
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early. However, the benefits of the early and regular access to data as they are 
collected by far outweigh difficulties associated with evolving datasets. 

The end users need to be aware of these caveats, especially when making use of 
the early data releases that will not have the full quality and reliability traditionally 
expected from a published catalog. Of course, while certainly a potential source 
of frustration, good understanding of (any!) dataset and its limits significantly 
improves the quality and longevity of the resulting science. Again, early and 
frequent data releases not only help users to understand "the survey error bars" , 
but perhaps more importantly help the project team to improve them before it is 
too late to "fix the problem" . 



3 The Galaxy with LSST and Gaia 



The Large Synoptic Survey Telescope, (LSST: irvezic et ah . 2008b ). will be a large. 



wide-field ground-based system designed to obtain multiple images covering the 
sky visible from its location at Cerro Pachon, Chile. The current baseline design, 
with an 8.4m (6.7m effective) primary mirror, a 9.6 deg^ field of view, and a 3,200 
Megapixel camera, will allow about 10,000 square degrees of sky to be covered 
using pairs of 15-second exposures in two photometric bands every three nights 
on average. The survey area will include 30,000 deg^ with S < -1-34.5°, and will 
be imaged multiple times in six bands, ugrizy, covering the wavelength range 
320-1050 nni. About 90% of the observing time will be devoted to a deep-wide- 
fast survey mode which will observe a 20,000 deg^ region about 1000 times in 
six bands during anticipated 10 years of operations. These data will result in 
databases including about 20 billion objects. 

LSST will produce a massive and exquisitely accurate photometric and astro- 
metric dataset for about 10 billion Milky Way stars. The coverage of the Galactic 
plane will yield data for numerous star-forming regions, and the y band data will 
penetrate through the interstellar dust layer. Photometric metallicity measure- 
ments will be available for about 200 mill ion main-sequence F / G stars which will 
sample the halo to distances of 100 kpc ( Ivezic et al. . 2008al) . No other existing 



or planned survey will provide such a massive and powerful dataset to study the 
outer halo (including Gaia which is flux limited at r = 20, and Pan-STARRS 
which will not have the u band). The LSST in its standard surveying mode 
will be able to detect RR Lyrae and classical novae out to 400 kpc, and hence 
explore the extent and structure of the halo out to half the distance to M31. 
All together, the LSST will enable studies of the stellar distribution beyond the 
presumed edge of the Galactic halo, of their metallicity distribution throughout 
most of the halo, and of their kinematics beyond the thick disk/halo boundary 
(|LSST Science Collaborations et al.l . l2009l) . 



In the context of Gaia, the LSST can be thought of as its deep complement. A 
comparison of LSST and Gaia performance is given in Figure [H Gaia will provide 
an all-sky catalog with unsurpassed trigonometric parallax, proper motion and 
photometric measurements to r ~ 20 for about 10^ stars. LSST will extend this 
map to r ~ 27 over half of the sky, detecting about 10^" stars. Because of Gala's 
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Fig. 2. A comparison of photometric, proper motion and parallax errors for SDSS, Gaia 
and LSST, as a function of apparent magnitude r, for a G2V star (it is assumed that 
r = G, where G is the Gala's broad-band magnitude). In the top panel, the curve 
marked "SDSS" corresponds to a single SDSS observation. The red curves correspond 
to Gaia; the long-dashed curve shows a single transit accuracy, and the dot-dashed curve 
the end of mission accuracy (assuming 70 transits). The blue curves correspond to 
LSST; the solid curve shows a single visit accuracy, and the short-dashed curve shows 
accuracy for co-added data (assuming 230 visits in the r band) . The curve marked "SDSS- 
POSS" in the middle panel shows accuracy delivered by the proper motion catalog of 
iMunn et al.l ()2004l ). In the middle and bottom panels, the long-dashed curves correspond 
to Gaia, and the solid curves to LSST. Note that LSST will smoothly extend Gala's 
error vs. magnitude curves four magnitudes fainter. The assumptions used in these 
computations are described in Eyer et al. (in prep.). 
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superb astrometric and photometric quahty, and LSST's significantly deeper reach, 
the two surveys are highly complementary: Gaia will map the Milky Way's disk 
with unprecedented detail, and LSST will extend this map all the way to the halo 
edge (Eyer et al., in prep). 
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